21 research outputs found

    Learning Affect with Distributional Semantic Models

    Get PDF
    The affective content of a text depends on the valence and emotion values of its words. At the same time a word distributional properties deeply influence its affective content. For instance a word may become negatively loaded because it tends to co-occur with other negative expressions. Lexical affective values are used as features in sentiment analysis systems and are typically estimated with hand-made resources (e.g. WordNet Affect), which have a limited coverage. In this paper we show how distributional semantic models can effectively be used to bootstrap emotive embeddings for Italian words and then compute affective scores with respect to eight basic emotions. We also show how these emotive scores can be used to learn the positive vs. negative valence of words and model behavioral data

    Da Facebook a Twitter: Creazione e utilizzo di una risorsa lessicale emotiva per la sentiment analysis di tweet

    Get PDF
    La presente tesi di laurea si propone di esporre una ricerca effettuata in ambito di sentiment analysis ed emotion detection su testo prodotto da social network. In particolare ci si è proposti di costruire un sistema di classificazione automatica della polarità dei tweet, prendendo le mosse dal task di Sentiment Polarity Classification (SENTIPOLC) proposto da Evalita per la conferenza 2014. Il presente classificatore è stato sviluppato attraverso l'uso di un SVM in grado di tenere in considerazione sia feature lessicali e non lessicali presenti nel testo, sia la componente emotiva che il testo stesso presenta. Per la componente emotiva si è provveduto alla creazione di uno spazio distribuzionale emotivo partendo da dati testuali ricavati dalla piattaforma di social neworking Facebook

    CAPISCO@CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data

    Get PDF
    This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task

    UNIPI-NLE at CheckThat! 2020: Approaching Fact Checking from a Sentence Similarity Perspective Through the Lens of Transformers

    Get PDF
    This paper describes a Fact Checking system based on a combination of Information Extraction and Deep Learning strategies to approach the task named Verified Claim Retrieval" (Task 2) for the CheckThat! 2020 evaluation campaign. The system is based on two main assumptions: a claim that verifies a tweet is expected i) to mention the same entities and keyphrases, and ii) to have a similar meaning. The former assumption has been addressed by exploiting an Information Extraction module capable of determining the pairs in which the tweet and the claim share at least a named entity or a relevant keyword. To address the latter, we exploited Deep Learning to refine the computation of the text similarity between a tweet and a claim, and to actually classify the pairs as correct matches or not. In particular, the system has been built starting from a pre-trained Sentence-BERT model, on which two cascade fine-tuning steps have been applied in order to i) assign a higher cosine similarity to gold pairs, and ii) classify a pair as correct or not. The final ranking produced by the system is the probability of the pair labelled as correct. Overall, the system reached a 0.91 MAP@5 on the test set

    Learning Affect with Distributional Semantic Models

    Get PDF
    The affective content of a text depends on the valence and emotion values of its words. At the same time a word distributional properties deeply influence its affective content. For instance a word may become negatively loaded because it tends to co-occur with other negative expressions. Lexical affective values are used as features in sentiment analysis systems and are typically estimated with hand-made resources (e.g. WordNet Affect), which have a limited coverage. In this paper we show how distributional semantic models can effectively be used to bootstrap emotive embeddings for Italian words and then compute affective scores with respect to eight basic emotions. We also show how these emotive scores can be used to learn the positive vs. negative valence of words and model behavioral data

    CoreNLP-it: A UD pipeline for Italian based on Stanford CoreNLP

    Get PDF
    This paper describes a collection of modules for Italian language processing based on CoreNLP and Universal Dependencies (UD). The software will be freely available for download under the GNU General Public License (GNU GPL). Given the flexibility of the framework, it is easily adaptable to new languages provided with an UD Treebank.Questo lavoro descrive un insieme di strumenti di analisi linguistica per l’Italiano basati su CoreNLP e Universal Dependencies (UD). Il software sarà liberamente scaricabile sotto licenza GNU General Public License (GNU GPL). Data la sua flessibilità, il framework è facilmente adattabile ad altre lingue con una Treebank UD

    CAPISCO @ CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data

    Get PDF
    This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
    corecore